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' Abstract 

We consider the online linear programming problem where the constraint matrix is revealed 
column by column along with the objective function. We provide a 1— o(l) competitive algorithm 
for this surprisingly general class of online problems under the assumption of random order of 
\q • arrival and some mild conditions on the right-hand-side input. Our learning-based algorithm 

works by dynamically updating a threshold price vector at geometric time intervals, the price 
learned from the previous steps is used to determine the decision for the current step. Our result 
provides a common near-optimal solution to a wide range of online problems including online 
routing and packing, online combinatorial auction, online adwords matching, many secretary 
problems, and various resource allocation and revenue management problems. Apart from online 
problems, the algorithm can also be applied for fast solution of large linear programs by sampling 
the columns of constraint matrix. 



H 
O 



Online optimization is attracting an increasingly wide attention in computer science, operations 



> ; 1 Introduction 

o 

, research, and management science communities because of its wide applications to electronic 

markets and dynamic resource allocation problems. In many practical problems, data does not 
reveal itself at the beginning, but rather comes in an online fashion. For example, in the online 
' combinatorial auction problem, consumers arrive sequentially requesting a subset of goods, each 

offering a certain bid price for their demand. On observing a request, the seller needs to make an 
irrevocable decision for that consumer with the overall objective of maximizing the revenue or 
other social welfare while respecting the inventory constraints. Similarly, in the online routing 
problem [I] , the central organizer receives demands for subsets of edges in a network from the 
users in a sequential order, each with a certain utility and bid price for his demand. The organizer 
needs to allocate the network capacity online to those bidders to maximize social welfare. A 
similar format also appears in knapsack secretary problems [2] , online keyword matching problem 
OS] j online packing problems [4], and various other online resource allocation problems. 

In all these examples mentioned above, the problem takes the format of online linear pro- 
gramming. In an online linear programming problem, the constraint matrix is revealed column 
by column with the corresponding coefficient in the objective function. After observing the 
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input received so far, the online algorithm must make the current decision without observing 
the future data. To be precise, consider the linear program 

maximize 2^=i 7r j x j 

subject to J2j=i a ij x j < h, i = 1, . . . ,m (1) 
< x j < 1 j = 1, . . . , n 

where Vj, TTj > 0, aj = (a^)™ 1 G [0, l]' m , and b = {&;}"=i e I n tne corresponding online 
linear programming problem, at time t, the coefficients (irt,dt) are revealed, and the algorithm 
must make a decision xt- Given the previous t — 1 decisions xi, . . . , xt_i, and input {wj, Oj}* =1 
until time t, the i" 1 decision is to select xt such that 

53j=l a ij x j — bit i = 1, . . . , 771 

< x t < 1 1 j 

The goal of the online algorithm is to choose iej's such that the objective function J^tLi ^i 2 -* i s 
maximized. Traditionally, people analyze an algorithm base on the worst-case input. However, 
this leads to very pessimistic bounds for the above online problem: no online algorithm can 
achieve better than 0(l/?i) approximation of the optimal offline solution [2J. Therefore, in this 
paper, we make the following enabling assumptions: 

Assumption 1. The columns a,j (with the objective coefficient nj) arrive in a random order, 
i.e., the set of columns can be adversarily picked at the start. However, after they are chosen, 
(oi, a,2, ■■; a n ) and (a a (i) i a a(2) > a a(n)) have same chance to happen for all permutation a. 

Assumption 2. We know the total number of columns n a priori. 

The first assumption says that we consider the average behavior of the online algorithm over 
random permutations. This assumption is reasonable in practical problems, since the order of 
columns usually appears to be independent of the content of the columns. It is also a standard 
assumption in many existing literature on online problems, most notably in the classical secretary 
problems [2]. 

The second assumption seems restrictive, however is necessary. In 0, the authors show that 
if Assumption 2 doesn't hold, then the worst case competitive ratio is bounded away from 1. 
Note that all the existing algorithms for secretary problems make this assumption [2j. In our 
algorithm, we will use this quantity n to decide the length of history used to learn the threshold 
prices. Of course, we can relax the assumption to the knowledge of n within at most lie 
multiplicative error, without affecting the results. Moreover, for certain revenue management 
and inventory control applications detailed below, the knowledge of n can be replaced by the 
knowledge of the length of time horizon under certain arrival assumptions, which is usually 
available in practice. 

In this paper, we present an almost optimal online solution for |(2J) under above assumptions. 
We also extend our results to the following more general online linear optimization problems. 
Consider a sequence of n non- negative vectors f\if2i---ifn ^ ^ > 17171 non- negative vectors 
9i\i9ii-> ■ ■ • 1 9in ^ [0' l] fe > i = 1, ■ ■ ■ , 7TC, and fc-dimensional simplex K = {x G M. k : x T e < 1}. In 
this problem, each time we make a fc-dimensional decision Xt G M fe , satisfying: 

E*=i 9ijXj < h i = l,...,m 

x t G K (3) 
x t >0,x t e R k 

where decision vector Xt must be chosen only using the knowledge up to time t. The objective 
is to maximize Ylt=i ft " x t over the whole time horizon. 
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1.1 Our results 



In the following, let OPT denote the optimal objective value for the offline problem ([T]). 

Definition 1. An online algorithm A is c- competitive in random permutation model if the 
expected value of the online solution obtained by using A is at least c factor of the optimal 
offline solution. That is, 

^a[Et=l^tXt(or,A)]>cOPT 

where the expectation is taken over uniformly random permutations a of 1, ... ,n, and Xt(er, A) 
is the t th decision made by algorithm A when the inputs arrive in the order a. 

Our algorithm is based on the observation that the optimal solution x* for the offline linear 
program is almost entirely determined by the optimal dual solution p* £ R m corresponding to 
the m inequality constraints. The optimal dual solution acts as a threshold price so that x* > 
only if TTi > p* T a{. Our online algorithm works by learning a threshold price vector from the 
input received so far. The price vector then determines the decision for the next step. However, 
instead of computing a new price vector at every step, the algorithm initially waits until ne 
steps, and then computes a new price vector every time the history doubles. That is, at time 
steps ne, 2ne, 4ne, . . . and so on. We show that our algorithm is I — 0(e)-competitive in random 
permutation model under certain conditions on the input. Our main result is stated as follows:: 

Theorem 1. For any e > 0, our online algorithm is 1 — 0(e) competitive for the online linear 
program |1P in random permutation model, for all inputs such that 

B = min 6, > fi( ^- J - L ) (4) 

We also give the following alternative conditions for the theorem to hold: 
Corollary 1. Theorem^ still holds if condition Op is replaced by 

g ^ n /(mA + m»)log(l/ e )\ (5) 



where A = loglog(f^), ir max = mhij= lt ... tn ir j ,ir min = min i= i ! ... i „ Kj . 

Observe that the lower bound in the condition on B depends on log(l/e)/e 2 . We may 
emphasize that this dependence on e is near-optimal. In [10] . the author proves that k > 1/e 2 
is necessary to get 1 — O(e) competitive ratio in the /c-secretary problem, which is a special case 
of our problem with m = 1, B = k and at = 1 for all t. 

We also extend our results the more general online linear programs as introduced in Q: 

Theorem 2. For any e > 0, our algorithm is 1 — 0(e) competitive for the general online linear 
problem {5]) in random permutation model, for all inputs such that: 

„ . , ^ /mlog (nk/e)\ 

B = min > Q ( 11 1 . (6) 

where 

Remark 1. Our condition to hold the main result is independent of the size of OPT or objective 
coefficients, and our result is also independent of any possible distribution of input data. If the 
largest entry of constraint coefficients does not equal to 1, then our both theorems hold if the 
condition (H|) or (JSJ) is replaced by: 



where at = maxjla^} or = maxj{||g^-||oo}. Note that this bound is proportional only to 
log(n) so that it is way below to satisfy everyone's demand. 
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It is apparent that our generalized problem formulation should cover a wide range of online 
decision making problems. In the next section, we discuss the related work and some of the 
applications of our model. As one can see, indeed our result improves the competitive ratios for 
many online problems studied in the literature. 

1.2 Related work 

Online decision making has been a topic of wide recent interest in the computer science, op- 
erations research, and management science communities. Various special cases of the general 
problem presented in this paper have been studied extensively in the literature, especially in the 
context of online auctions and secretary problems. Babaioff et al. [2] provide a comprehensive 
survey of existing results on the secretary problems. In particular, constant factor competitive 
ratios have been proven for fc-secretary and knapsack secretary problems under random permu- 
tation model. Further, for many of these problems, a constant competitive ratio is known to 
be optimal if no additional conditions on input are assumed. Therefore, there has been recent 
interest in searching for online algorithms whose competitive ratio approaches 1 as the input 
parameters become large. The first result of this kind appears in [10], where a 1 — 0(1/ Vk)- 
compctitivc algorithm is presented for k secretary problem under random permutation model. 
More recently, Devanur et al. [7] presented a 1 — 0(e)-compctitive algorithm for the online 
adwords matching problem under assumption of certain lower bounds on OPT in terms of e and 
other input parameters. In [7], the authors raise several open questions including the possibility 
of such near-optimal algorithms for a more general class of online problems. In our work, we 
give an affirmative answer to this questions by showing a 1 — 0(e)-competitive algorithm for a 
large class of online linear programs under a weaker lower bound condition. 

A common element in the techniques used in existing work on secretary problems [2] (with the 
exception of Klcinberg |10j ) , online combinatorial auction problems pQ, and adwords matching 
problem [7], has been one-time learning of threshold price(s) from first ne of customers, which 
is then used to determine the decision for remaining customers. Our algorithm is based on 
the same learning idea. However, in practice one would expect some benefit from dynamically 
updating the prices as more and more information is revealed. Then, a question would be: how 
often and when to update them? Our dynamic price update builds upon this intuition, and we 
demonstrate that it's better to update the prices at geometric time intervals-not too soon and 
not too late. In particular, we present an improvement from a factor of 1/e 3 to 1/e 2 in the lower 
bound requirement by using dynamic price updating instead of one-time learning. 

In our analysis, we apply many standard techniques from PAC-learning, in particular, con- 
centration bounds and covering arguments. These techniques were also heavily used in [5] and 
[7j . In [5] , price learned from one half of bidders is used for the other half to get an incentive 
compatible mechanism for combinatorial auctions. Their approach is closely related to the idea 
of one-time learning of price in online auctions, however, their goal is offline revenue maximiza- 
tion and an unlimited supply is assumed. And [7] , as discussed above, considers a special case of 
our problem. Part of our analyses is inspired by some ideas used there, as will be pointed out 
in the text. 

1.3 Specific Applications 

In the following, we show some of the applications of our algorithm. It is worthy noting that for 
many of the problems we discuss below, our algorithm is the first near-optimal algorithm under 
the distribution-free model. 
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1.3.1 Online routing problems 

The most direct application of our online algorithm is the online routing problem [4]. In this 
problem, there are m edges in a network, each edge i has a bounded capacity bi. There are n 
customers arriving online, each with a request of certain path a t £ {0, l} m , where an = 1, if 
the path of request t contains edge i, and a utility ir t for his request. The offline problem for 
the decision maker is given by the following integer program: 

maximize z^t=i ^tXt 

subject to J2t=i a nxt < bi i = 1, . . . , m (7) 
x t £ {0,1} 

By Theorem [TJ and its natural extension to integer programs as will be discussed in Section [4.2l 
our algorithm gives a 1 — O(e) competitive solution to this problem in the random permutation 
model as long as the edge capacity is reasonably large. Earlier, a best of log(m ^"'° 3: ) competitive 
algorithm was known for this problem under worst case input model [4] . 

1.3.2 Online single-minded combinatorial auctions 

In this problem, there are m goods, bi units of each good i are available. There are n bidders 
arriving online, each with a bundle of items a t £ {0, l} m that he desires to buy, and a limit 
price 7Tt for his bundle. The offline problem of maximizing social utility is same as the routing 
problem formulation given in ([7]) . Due to use of a threshold price mechanism, where threshold 
price for t th bidder is computed from the input of previous bidders, it is easy to show that 
our 1 — 0(e) competitive online mechanism also supports incentive compatibility and voluntary 
participation. Also one can easily transform this model to revenue maximization. A log(m)- 
compctitivc algorithm for this problem in random permutation setting can be found in recent 
work pp. 

1.3.3 The online adwords problems 

The online adwords allocation problem is essentially the online matching problem. In this 
problem, there are n queries arriving online. And, there are m bidders each with a daily budget 
bi, and bid 7r.y on query j. For j th query, the decision vector Xj is an 771-dimensional vector, 
where £ {0, 1} indicates whether the j th query is allocated to the i th bidder. Also, since 
every query can be allocated to at most one bidder, we have the constraint xje < 1. Therefore, 
the corresponding offline problem can be stated as: 

En T 
j =1 TTj Xj 

subject to XX=i ^ijXij < bi Vi 
xje < 1 
Xj £ {0, l} m 

The linear relaxation of above problem is a special case of the general linear optimization problem 
([3]) with fj = TTj , g t j = Ttij ei where is the ith unit vector of all zeros except 1 for the ith 
entry. By Theorem [2] (and remarks below the theorem), and extension to integer programs 
discussed in Section l4~2l our algorithm will give a 1 — O(e) approximation for this problem given 
the lower bound condition that for all i, —^s > ml °g(™"/ e ) ; where 7r™ a:E = ma,Xj{TTij} is the 
largest bid by bidder i among all queries. 

Earlier a 1 — 0(e) algorithm was provided for this problem by [7]. We may point out that 
we have eliminated the condition on OPT obtained by [7] and only have a condition on bi which 
may be checkable before the execution, a property which is not provided by the condition of 
former type. Further, our lower bounds on B are weaker by an e factor to the one obtained 
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in [7]. Later, we show that this improvement is a result of dynamically learning the price 
at geometric intervals, instead of one-time learning in [7j. Richer models incorporating other 
aspects of sponsored search such as multiple slots, can be formulated by redefining f ^g^,K to 
obtain similar results. 

1.3.4 Yield management problems 

Online yield management problem is to allocate perishable resources to demands in order to 
increase the revenue by best online matching the resource capacity and demand in a given time 
horizon T [51[5J[TT]. It has wide applications including airline booking, hotel reservation, ticket 
selling, media and the internet resource allocation problems. In this problem, there are many 
types of product j, j = 1, 2, J, and several resources bi, i = 1, to. To sell a unit demand 
of product j, it requires to consume tij unit of resource i for all i. Buyers demanding each 
type of product come in a stationary Poisson process and each offers a price 7r for his or her 
unit product demand. The objective of the seller is to maximize his or her revenue in the time 
horizon T while respecting given resource constraints. The offline problem is exactly the same 
as the online routing problem: 

maximize ^t^t^t 

subject to J2t a t x t < b, (8) 
s t G {0,1}, Vt, 

where at is a type of product tj, j = 1, J. In this problem, we may not know the exact 
number n of the total buyers in advance. But since the buyers arrive according to a stationary 
Poisson process, they can be viewed as randomly ordered and we can use the bids in the first 
eT time period to learn the optimal dual price and apply to the remaining time horizon. Given 
T large enough, the number of buyers in eT time will be approximately en and therefore our 
algorithm will give a near-optimal solution to this problem, even though the arrival rate of 
buyers for each product is not known to the seller. 

1.3.5 Inventory control problems with replenishment 

This problem is similar to the multi-period yield management problem discussed in the previous 
subsection. The sellers have to items to sell and each time a bidder comes and requests a certain 
bundle of items aj and offers a price TTj. In this problem, we have periodic selling period. In each 
period, the seller has to choose an inventory b at the beginning, and then allocate the demand 
of the buyers during this period. Each unit of bi costs capital Cj and the total investment biCi 
in each period is limited by budget C . There are many periods in the whole time horizon and 
the objective is to maximize the total revenue in the whole time horizon. The offline problem 
of each period, for all bidders arrive in that period, is as follows: 

maximize x ^ '^2 t ' ntXt 
subject to Y^t atXt — b, 

< x t < 1 

Note that given b, the problem for one period is exactly as we discussed before. Given the 
bids come in a random permutation over the total time horizon, our analysis will show that 
the itemized demands b for the previous period will be approximately the same for the rest 
periods, and online pricing the bids for the rest of periods based on the itemized dual prices 
learnt from the previous period would give a revenue that is close to the optimal revenue of the 
offline problem over the whole time horizon. 



(9) 

Vt 
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The rest of the paper is organized as follows. In Section [5] and [3J we present our online 
algorithm and prove that it achieves 1 — O(e) competitive ratio under mild conditions on the 
input. To keep the discussion simple and clear, we start in Section [2] with a simpler one-time 
learning algorithm. While the analysis for this simpler algorithm will be useful to demonstrate 
our proof techniques, the results obtained in this setting are weaker than those obtained by 
our dynamic price update algorithm, which is discussed in Section [3] In Section 31 we present 
the extension to multidimensional online linear programs and the applicability of our model to 
solving large static linear programs. Then we conclude our paper. 



2 One-time learning algorithm 

For the linear program ([1]), we use p € K m to denote the dual variable associated with the first 
set of constraints ^\ a t Xt < b. Let p denote the optimal dual solution to the following partial 
linear program defined only on the input until time s — \ne] : 

maximize Ylt—i 7r * a; * 

subject to J2t=i a u x t ^ (! _ e )f b i i = l, •••,m (10) 
< x t < 1 t=l,...,s 

Also, for any given dual price vector p, define the allocation rule xt(p) as: 



r (n) - / ° if nt ~ pTat 


(11) 


Our one-time learning algorithm can now be stated as follows: 




Algorithm 1 One-time Learning Algorithm (OLA) 


1. Initialize s = \ne \ , Xt = 0, for all t < s. And p is denned as 


above. 


2. Repeat for t = s + 1, s + 2, . . . 




(a) If a it Xt(p) <h- Yj%i a ij x j, set x t = x t {p); otherwise, 


set xt = 0. Output x t . 



This algorithm learns a dual price vector using the first |~ne] arrivals. Then, at each time 
t > \ne\ , it uses this price vector to decide the current allocation, and executes this decision as 
long as it doesn't violate any of the constraints. An attractive feature of this algorithm is that 
it requires to solve a linear program only once, and the linear program it solves is significantly 
smaller, defined only on \ne\ variables. In the next subsection, we prove the following proposition 
regarding the competitive ratio of this algorithm, which relics on a stronger condition than 
Theorem [U 

Proposition 1. For any e > 0, the one-time learning algorithm is 1 — 6e competitive for online 
linear programming, if 

. , 6mlog(n/e) , „. 

B = min&j > ° v ' ' (12) 

2.1 Competitive ratio analysis 

Observe that the one-time learning algorithm waits until time s = \ne \ , and then sets the 
solution at time t as Xt(p), unless there is a constraint violation. To prove the competitive ratio 
of this algorithm, we first prove that with high probability, Xt(p) satisfies all the constraints of 
the linear program. Then, we show that the expected value X)t 7r * a; t(p) ^ s c l° se to the optimal 
offline objective value. For simplicity of the discussion, we assume s = ne in the following. 
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To start with, we observe that if p* is the optimal dual solution to (fT]), then {xt(p*)} is close 
to the primal optimal solution x*, That is, learning the dual price is sufficient to determine the 
primal solution. We make the following simplifying assumption: 

Assumption 3. The inputs of this problem are in general position, namely for any price vector 
p, there can be at most m columns such that p T a t = ir t . 

The assumption is not necessarily true for all inputs. However, one can always randomly 
perturb 7r t by arbitrarily small amount r\ through adding a random variable £f taking uniform 
distribution on interval [0, 77] . In this way, with probability 1, nop can satisfy m + 1 equations 
simultaneously among P T at = 7it, and the effect of this perturbation on the objective can be 
made arbitrarily small q Given this assumption, we can use complementary conditions of linear 
program (fT]) to observe that: 

Lemma 1. x t {p*) < x* t , and under Assumption^ x* t and Xt{p*) differ on at most m values 
oft. 

However, in the online algorithm, we use the price p learned from first few inputs, instead 
of the optimal dual price. The remaining discussion attempts to show that the learned price 
will be sufficiently accurate for our purpose. Note that the random order assumption can be 
interpreted to mean that the first s inputs are a uniform random sample without replacement 
of size s from the n inputs. Let S denote this sample set of size s, and N denote the complete 
set of size n. Consider the sample linear program (TIT))) defined on the sample set S with right 
hand side set as (1 — e)eb. Then, p was constructed as the optimal dual price of the sample 
linear program, which we refer to as the sample dual price. We prove the following lemma: 

Lemma 2. The primal solution constructed using sample dual price is a feasible solution to the 
linear program^ with high probability. More precisely, with probability 1 — e, 

n 

y^auxtip) < h, Vi = l,...,m 

t=i 

given a > *y ' ' . 

Proof. The proof will proceed as follows: Consider any fixed price p. We say a random sample 
S is "bad" for this p if and only if p = p(S), but J2t=i a U x t(p) > f° r some i. First, we show 
that the probability of bad samples is small for every fixed p. Then, we take union bound over 
all "distinct" prices to prove that with high probability the learned price p will be such that 
Xrt=i a lt xt{p) < h for all i. 

To start with, we fix p and i. Define Y t = aaXtip). If p is an optimal dual solution for 
sample linear program on S, then by the complementary conditions, we have: 

Etes y t = Et G s a itMp) < (1 - (13) 
Therefore, the probability of bad samples is bounded by: 

P(Etes Yt<(l- e)eh, £ tG iV Y t > h) < P(| £ teS Y t - e £ teAr Y t \ > e%\J2 teN Y t = h) 

< 2exp(^)<<5 

(14) 

where 8 = — ^—^ . The last step follows from Hoeffding-Bcrnstein's Inequality (Lemma [TU] in 
appendix [A"|l . and the assumption made on B. 

Next, we take a union bound over all "distinct" p's. We call two price vectors p and q 
distinct if and only if they result in distinct solutions, i.e., {xt(p)} 7^ {xt(q)}. Note that we 

2 This technique for resolving ties was also used in [7]. 
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only need to consider distinct prices, since otherwise all the Yj's are exactly the same. Thus, 
each distinct p is characterized by a unique separation of n points ({a t , 7rt}™=i) m Ti-dimensional 
space by a hyperplane. By results from computational geometry [12] , the total number of such 
distinct prices is at most n m . Taking union bound over n m distinct prices, and i = 1, . . . , m, we 
get the desired result. □ 

Above we showed that with high probability, xt(p) is a feasible solution. In the following, we 
show that it actually is a near-optimal solution if we include the objective value in the learning 
part. 

Lemma 3. The primal solution constructed using sample dual price is a near-optimal solution 
to the linear program ([!]) with high probability. More precisely, with probability 1 — e, 

J2"MP) > (1 -3e)OPT (15) 
tew 

T-> ^ 6mlog(n/e) 

given a > *y ' ' . 

Proof. The proof of this lemma is based on two observations. First, {xt (p)}t=i an d P satisfies 
all the complementarity conditions, and hence is an optimal primal-dual solution of the following 
linear program 

maximize X^teJV ^tXt 

subject to J2teN a it x t <h i = l,..-,m (16) 
< Xj < 1 j = 1 , . . . , n 

where bi = J2teN a itXt(P) if Pi > 0, and bi = h, if p t = 0. 

Second, one can show that if pi > 0, then with probability 1 — e, bi > (1 — 3e)bi. To see this, 
observe that p is optimal dual solution of sample linear program on set S, let x be the optimal 
primal solution. Then, by complementarity conditions of the linear program, if pi > 0, then the 
i th constraint must be satisfied by equality. That is, YlteS a a x t = (1 — e)ei>j. Then, given the 
observation made in Lemma [TJ and that B = min; bi > p-, we get: 

^ auXtip) > ^2 a it xt — m > (1 - 2e)e6 ?; . (17) 

tes tes 

Then, using the Hoeffding-Bernstein's Inequality, in a manner similar to the proof of Lemma 
[5J we can show that (the proof is given in Appendix IA.31 ) given the lower bound on B, with 
probability at least 1 — e: 

k = J2 a iMP) > (1 -3e)&, (18) 

Lastly, observing that whenever (fT5)) holds, given an optimal solution x* to ([TJ) , (1 — 3e)x* 
will be a feasible solution to (fT5|) . Therefore, the optimal value of is at least (1 — 3e)OPT, 
which is equivalcntly saying that 

n 

J2ntxt(p) > (l-3e)OPT 

□ 

Therefore, the objective value for online solution taken over entire period {1, . . . , n} is near 
optimal. However, the online algorithm does not make any decision in the learning period S, and 
only the decisions from period {s + 1, . . . , n} contribute to the objective value. The following 
lemma that relates sample optimal to the optimal value of a linear program will bound the 
contribution from the learning period: 
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Lemma 4. Let OPT(S) denote the optimal value of the linear program (|10[) over sample S, 
and OPT(N) denote the optimal value of the offline linear program ([I]) over N . Then, 

E[OPT(S)] < eOPT(N) 

Proof. Let (x* ,p* ,y*) and (x,p,y) denote the optimal primal-dual solution of linear program 
([1} on N, and sample linear program on S, respectively. 

(P*,V*)= argmin b T p + Y.t£NVt (P,V) = argmin (1 - e)eb T p + J2 te s Vt 

s.t. p T a t +y t >TT t t £ N s.t. p T a t + y t >Tr t , t€ S 

p,y>o p,y>o 

Since S C N, p* ,y* is a feasible solution to the dual of linear program on S, by weak duality 
theorem: 

OPT(S)<e&V + X>* 

tes 

Therefore, 

E[OPT(5)] < eb T p* + E[£ y* t ] = e(b T p* + £ tf t ) = eOPT(TV) 

tes teN 

□ 

Now, we are ready to prove Proposition [T] 

Proof of Proposition [1} Using Lemma [2] and Lemma [3j with probability at least 1 — 2e, 
the following event happen: 



^ a it xt{p) <h, i=l,..., 



id 



X>t*t(p) > (l-3e)OPT 

t=i 

That is, the decisions xt{p) are feasible and the objective value taken over the entire period 
{1, . . . , n} is near optimal. Denote this event by £, where Pr(£) > 1 — 2e. We have by Lemma 
El E] and Lemma H 

nEteN\s VtXtip) \£] > (1 - 3e)OPT - EE teS n t x t (p) \ 8} 

> (l-3e)OPT- T ^EE teS ^(P)] (19) 

> (1 - 3e)OPT - 4^§r 

Therefore, 

n 

E[ nxt(P)] >MJ2 ntXt ^ I £ 1 ' Pri > £ ) 6e )° PT 

t=s+l t£N\S 

□ 



3 Dynamic price update algorithm 

The algorithm discussed in the previous section uses the first ne inputs to learn the price, and 
then applies it in the remaining time horizon. While this one-time learning algorithm has its 
own merits, in particular, requires solving only a small linear problem defined on ne variables, 
the lower bound required on B is stronger than that claimed in Theorem [1] by an e factor. 
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In this section, we discuss an improved dynamic price update algorithm that will achieve 
the result in Theorem [1] Instead of computing the price only once, the algorithm will update 
the price every time the history doubles, that is, it will learn a new price at time periods 
t = ne,2ne,4ne, .... To be precise, let p e denote the optimal dual solution for the following 
partial linear program defined only on the input until time t: 

maximize 2_it=i n tXt 
subject to Y,t=i a it x t < (1 
< x t < 1 

where the set of numbers he are defined as follows: 

hi = £v /f W (21) 

Also, for any given dual price vector p, define the allocation rule Xt(p) as earlier in (fTTj) . Then, 
our dynamic price update algorithm can be stated as follows: 



Algorithm 2 Dynamic Pricing Algorithm (DPA) 

1. Initialize to = |~ ne l j x t = %t = 0, for all t < to- 

2. Repeat for t = to + 1, *o + 2, . . . 

(a) Set xt = xt(p i ), where t = \2 r ne\ for largest r such that t < t. 

(b) If anxt < hi — S!=i a ij x j, set xt = xt; otherwise, set xt = 0. Output xt. 



Note that we update the price |~log 2 (1 / e)] times during the whole time horizon. Thus, the 
algorithm requires more computation, but as we show next it requires a weaker lower bound on 
B for proving the same competitive ratio. The intuition behind this improvement is as follows. 
Note that initially, at I = ne, hi = *Je > e. Thus, more slack is available, and so the large 
deviation argument for constraint satisfaction (as in Lemma [5]) requires a weaker condition on 
B. The numbers ht decrease as I increases. However, for large values of I, sample size is larger, 
making the weaker condition on B sufficient for our purpose. Also, he decrease rapidly enough, 
so that the overall loss on the objective value is not significant. The careful choice of numbers 
he will play a key role in proving our results. 

3.1 Competitive ratio analysis 

The analysis for the dynamic algorithm proceeds in a manner similar to that for the one- 
time learning algorithm. However, stronger results for the price learned in each period need 
to be proven here. In this proof, for simplicity of discussion, we assume that e = 2~ E for 
some integer E, and that the numbers I = 2 r ne for r = 0, 1,2, ...,E — 1 are all integers. Let 
L = {ne,2ne,...,2 B - 1 e}. 

Lemma [5] and |6] are in parallel to Lemma [2] and [U in the previous section, however require 
a weaker condition on B: 

Lemma 5. For any e > 0, with probability 1 — e: 

} anxAp ) < —bi, for all i e {1, . . . , m}, i € L 
^-^ n 

t=e+i 

given B = minibi > 20m ^° K " . 



-ht)±bi i = l,...,m (20) 
t=l,...,l 
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Proof. To start with, we fix p and £. This time, we say a random order is "bad" for this p if and 
only if there exists / € L, such that p = p l but Et=f+i a it x t(p l ) > j^h for some i. Consider the 
i th component ^ t aait for a fixed i. For ease of notation, we temporarily omit the subscript 
i. Define Y t = a t Xt(p). If p is an optimal dual solution for (|20[) . then by the complementarity 
conditions, we have: 

Eti^ = Eti«^(p)<(i-M^ ( 22 ) 

Therefore, the probability of "bad" permutations for p is bounded by: 

P(EUY t <(l-ht)%, Ztt« +1 Yt> b i) 
< PtZLi K < (! - k)*>l£i Y t > + P(| Eti - I E?li y.1 > fr« , E2Li Yt<&) 

Define 6 = m .^ m . E ■ Using Hocffding-Bernstein's Inequality (Lemma [TU] in appendix, here R = 
2^, o\ < b/n, and A# < 1), we have: 

P(Et=i Y t < (i-^)f ,E?ii y t > it) < p (E*=i y t < (i-^)"IE t =i y t > it) 

< P(El 1 ^<(i-^)flE?i 1 ^ = f) 

< P(IEt =1 ^-|E^^I>^IE£i^ = ^) 

< 2exp(- 5 7 )<| 

and 

^(1 Etx *5 ~ I Etix *| > ^ Et=i Y t < < 2 exp(-^) < | 
where the last steps hold because he < 1, and the assumption made on B. 

Next, we take a union bound over n m distinct prices, z = 1, . . . ,m, and E values of I, the 
lemma is proved. □ 

In the folllowing, we will use some notations. Let LP s (d) denote the partial linear program 
that is defined on variables till time s, i.e. (x\, . . . , x s ), with right hand side in the inequality 
constraints set as d. That is, 

maximize Et=i ^tXt 
LP s (d) : subject to Et=i a nxt < dt i = 1, . . . , m 
< x t < 1 t = l,...,s 

And let OPT s (e£) denote the optimal objective value for LP s (d). 
Lemma 6. With probability at least 1 — e, for all I £ L: 

il 

E 



) > (1 - 2ht - e)OPT 2e (—b) 



given B = min^ b t > 



20m log n 



Proof. Let Sj = Ejii a ij x j(P i ) f° r * such that pf > 0, and Sj = ^frj, otherwise. Then, note that 
the solution pair ({xt(p )}|£i,p ), satisfies all the complementarity conditions, and therefore is 
an optimal primal-dual solution for the linear program LP2£(6): 

maximize 2_it=i n tXt 

subject to Et=i a aXt <bi i = 1, ... ,m (23) 



0<x t <l t=l,...,2£ 



This means 



ii i 
Y^txtip') = OFT 2l {h) > (n 
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Now, let us analyze the ratio 2 a,\ln • By definition, for i such that pj = 0, bi = 2£.bi/n. Otherwise, 
using techniques similar to the proof of Lemma [5l we can prove that with probability 1 — e, 

2/ 



bi = a it x t (p e ) > (1 — 2ht - e)—b. 



(25) 



A detailed proof of inequality (|25[) appears in appendix IB. 11 And the lemma follows from 
1251). □ 



Next, we prove the following lemma relating the sample optimal to the optimal value of the 
offline linear program: 



Lemma 7. For any I, 



OPT e (-b) 



£ 

< -OPT 

n 



Proof. The proof is exactly the same as the proof for Lemma [4j 



□ 



Proof of Theorem [1} Observe that the output of the online solution at time t G {(. + 
1, . . . , 2£} is x t (p e ) as long as the constraints are not violated. By Lemma[5]and Lemma |5J with 
probability at least 1 — 2e: 



21 



EauxAp ) < —bi, for all i G {1, . . . , m}, £ G L 
n 



t=e+i 



21. 



Of 

V n t x t (p e ) > (1 - 2h e - e)OPT 2i (-b) 



Denote this event by £ , where Pr(£) > 1 — 2e. Given this event, the expected objective value 
achieved by the online solution: 



2C 



E E E *txttf)\e\ 





££L t=£+l 






" 21 


> 


E E 


E* 




leL 


.t=i 


> 


Ed 










> 


OPT 


-E 



E E 

eeL 



^TT t xt{p l )\£ 



2/ 

OPT 2< (-6)|f 

n 

21 

OYT n {—b)\£ 



-E E 
< E 



OPT £ (-b)|£ 

n 

OPT M (— 



OPT 



2£ 

OPT 2< (-6) 

n 



Pr(£) 



E^ 



-E[OPT £ „(eb)|£] 
1 



2/ 

OPT 2< (-6) 

n 



Pr{£) 



E[OPT cn (eb)} 



OPT — V fy-OPT — V -OPT —OPT 

1 - 2e ^ n l~2e<^ n 1 - 2e 



OPT 



12e 
l-2e 



OPT 
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where the last inequality follows from the fact that 



E^H 1 -^ and E^ = e Ev^ 2 - 5e 

Therefore, 

EE/gi EL+i T t a: t (^)] > E\£ teL £^ +1 ^x t {p ( )\£] Pr(£) > (1 - 14e)OPT (26) 
Thus, Theorem Q] is proved. □ 

Remark 2. To remove the dependence on logn m the lower bound on B as in Corollary [I] , 
we ttse tfte observation that the number of points n in the expression n m for number of distinct 
prices can be replaced by number of "distinct points" among {(a t , 7r t )}" =1 which can reduced 
to 0(log 1+e (A) log T 1 "|_ c (l/e)) by a simple preprocessing of the input introducing a multiplicative 
error of at most 1 — e. And in this case, the condition on B is 

20(mA + m 2 log(l/e)) 

B - 2 

4 Extensions and Conclusions 

We present a few extensions and implications of our results. 
4.1 Online multi-dimensional linear program 

We consider the following more general online linear programs with multidimensional decisions 
x t e R fc at each step, as defined in Section Q] 



En £ T 
t=i ft x t 

subject to Y%=i 9u x t <h £ = 1, . . • , 
xfe<l,x t >0 Vf 



(27) 



Our online algorithm remains essentially the same (as described in Section [3]), with x t (p) now 
defined as follows: 

Mp) = {° ii ': orai: - / -- / ;;' EiPi9itj (f ~ , (28) 

[ e r otherwise, where r € arg maxj- (/y — 2^nPi9itj) 

Using the complementary conditions of (|27|) . and the lower bound condition on B as assumed 
in Theorem [21 we can prove following lemma; the proofs are very similar to the proofs for the 
one-dimensional case, and will be provided in the appendix. 

Lemma 8. x* t and x t (p*) differ in at most m values oft. 

Lemma 9. Let p and q are distinct if x t (p) ^ x t (q). Then, there are at most n m k 2m distinct 
price vectors. 

With the above lemmas, the proof of Theorem [2] will follow exactly as the proof for Theorem [TJ 
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4.2 Online Integer programs 

From the definition of x t (p) in (| 1 lj) for linear programs, the algorithm always outputs integer 
solutions. And, since the competitive ratio analysis will compare the online solution to the 
optimal solution of the corresponding linear relaxation, the competitive ratio stated in Theorem 
Q]also holds for the online integer programs. The same observation holds for the general online 
linear programs introduced in Section 14.11 since it also outputs integer solutions. Our result 
implies a common belief: when relatively sufficient resource quantities are to be allocated to 
a large number of small demands, linear programming solutions possess a small gap to integer 
programming solutions. 

4.3 Fast solution of large linear programs by column sampling 

Apart from online problems, our algorithm can also be applied for solving (offline) linear pro- 
grams that are too large to consider all the variables explicitly. Similar to the one-time learning 
online solution, one could randomly sample a small subset ne of variables, and use the dual 
solution p for this smaller program to set the values of variables Xj as Xj (p) . This approach is 
very similar to the popular column generation method used for solving large linear programs [6] . 
Our result provides the first rigorous analysis of the approximation achieved by the approach of 
reducing the linear program size by randomly selecting a subset of columns. 

To conclude, we have provided a 1 — o(l) competitive algorithm for a general class of online 
linear programming problems under the assumption of random order of arrival and some mild 
conditions on the right-hand-side input. These conditions are independent of the optimal ob- 
jective value, objective function coefficients, or distributions of input data. The application of 
this algorithm includes various online resource allocation problems which is typically very hard 
to get a near-optimal bounds in the online context. This is the first near-optimal algorithm for 
general online optimization problems. 

Our dynamic learning-based algorithm works by dynamically updating a threshold price 
vector at geometric time intervals. This geometric learning frequency may also be of interest to 
statistical and machine learning communities. It essentially indicates that not only it might be 
bad to react too slow, but also to react too fast. 

There are many remaining questions. Could the condition on the right-hand-side vector be 
further improved? Could we handle linear programming with both buy and sell customers, that 
is, some coefficients of a,j are negative? Could our online and learning algorithm be adapted to 
approximately solve more dynamic and stochastic programming problems? 
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A Supporting lemmas for Section [2] 

A.l Hoeffding-Bernstein inequality 

By Theorem 2.14.19 in [15] : 

Lemma 10. [Hoeffding-Bernstein inequality] Let ui,U2, ...u r be a random sample without re- 
placement from the real numbers {c\, C2, cr}. Then for every t > 0, 

P(\ Ei * ~ rc\ > t) < 2 exp(- 2ra ( +tAR ) (29) 
where A R = max; c t - min; c i; c = £\ q, and o\ = J2i=i( c i ~ s ) 2 - 
A. 2 Proof of Lemma [D 

Let x*,p* be optimal primal-dual solution of the offline problem |T]). From KKT conditions of 
(TTJ), ajj = 1, if (p*) T a t < ir t and x* t = if (p*) T a t > ir t . Therefore, x t (p*) = x$ if (p*) T a t ^ ir t . 
By assumption [5J there are atmost m values of t such that (p*) T a t ^ ir t . 

A. 3 Proof of inequality ( fl~8l) 

We prove that with probability 1 — e 

k = ^2 a n x t(p) ^ C 1 _ 3e )^ 

teN 

given Etes a it x t{p) > (1 — 2e)e6,;. The proof is very similar to the proof of LemmaH] Fixaprice 
vector p and i. Define a permutation is "bad" for p, i if both (a) J^tes a it x t{p) > (1 — 2e)ebi 
and (b) Y^teN a n x t(p) < (1 - 3e)&.; hold. 

Define Y t = anXtip). Then, the probability of bad permutations is bounded by: 
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Pr(l Etes Y t ~ * EteN Yt\ > Et GJ v 3e)6,) < 2cxp (-^1) < (30) 

where the last inequality follows from the assumption that bi > 6Tral °g("/ e ) _ Summing over n m 
distinct prices and i = 1, . . . , m, we get the desired inequality. 



B Supporting lemmas for Section [3] 
B.l Proof of inequality (1251) 

Proof. The proof is very similar to the proof of Lemma [5l Fix a p, £ and i € {1, . . . , to}. Define 
"bad" permutations for p, i, £ as those permutations such that all the following conditions hold: 
(a) p = p 1 , that is, p is the price learned as the optimal dual solution for (|20|) . (b) pi > 0, and (c) 
Et=i a it x t(p) < (1 — 2/i^ — e) — bi. We will show that the probability of these bad permutations 
is small. 

Define Y t = auXt(p)- If p is an optimal dual solution for (|20p . and p, > 0, then by the KKT 
conditions the i th inequality constraint holds with equality. Therefore, by observation made in 
Lemma [l] we have: 

Eti Y t = TLi a iMp) > (1 - fc)£&< - m > (1 - ^ - e)^bi (31) 

where the second last inequality follows from assumption B = min; bi > and £ > ne. 
Therefore, the probability of "bad" permutations for p, i,£ is bounded by: 

^(IEtx^-|Etii^l>^¥lE t =i^<(l-2^-e)fM < 2cxp(-l^)<<5 

where 5 = e „ E . The last inequality follows from the assumption on i?. Next, we take a 
union bound over the n m "distinct" p's, i = 1, . . . , to, and E values of £, we conclude that with 
probability 1 — e 

-"^ 2£ 
VVtft^) > (1 - 2/t,£ - e)— 6 l 
* — ' n 

i=l 

for all i such that > and all £. □ 



C Online multi-dimensional linear program 
C.l Proof of Lemma [8] 

Proof. Using Lagrangian duality, observe that given optimal dual solution p* , optimal solution 
a;* is given by: 

maximize ffx t - Ei P*9u x t ( 32 \ 
subject to xfe < l,x t > 

Therefore, it must be true that if x* tr = 1, then r € argmax^ ftj — (p*) T gtj an d ftr — {p*) T f)tr ^ 
This means that for t's such that maxj f t j — (p*) T g t j is strictly positive and argmaxj returns 
a unique solution, x t {p*) and x* t are identical. By random perturbation argument there can be 
atmost to values of t which do not satisfy this condition (for each such t, p satisfies an equation 
ftj — P T Qtj = fti ~ P T 9a f° r somc or ftj — p 1 g t j = for some j). This means x* and 
x t {p*) differ in atmost to positions. □ 
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C.2 Proof of Lemma [9] 

Proof. Consider nk 2 expressions 

ftj - P T m - (h - P T 9ti), l<j,l<k,j^l,l<t<n 
ftj ~ p T gtj, l < j < k, l <t <n 

Xt(p) is completely determined once we determine the subset of expressions out of these nk 2 
expressions that are assigned a non-negative value. By theory of computational geometry, there 
can be at most (nk 2 ) m such distinct assignments. □ 
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